SdeUHENf RESUME 



PS 014 296 

ebok, Thomas D. 

Which Questions Do We Need to Ask about Follow 
Through?: Some thoughts St imuiated by Considering the 
Functions of Panel Studies. 

National Inst* 6£ Education (ED), Washington, bC. 
81 

22p. 

Viewpoints (120) 



HF01/PCpl_ Plus Postage. 

Early Childhood Education^ *Prbgram Evaluation; 
Research Design;_*Research Methodology? Research 
Needs; Research Problems 

Evaluation Problems; *Panel Studies; ^Project Follow 
Through; Research Priorities; Research Suggestions 



^ National Institute of Education (NIE) research 
prior ities indicate the use of panel measures only during the time 
children are in Follow Through programs and further seem to indicate 
that only implementation variables should be measured. This paper 
raises questions about the desirability of the narrow focus implied 
by the NIE priorities, the opportunity costs that may be associated 
with them^ the nature of events that seem to have led to the adoption 
of the harrow focus, and the validity of some of the assumptions on 
which the focus seems to be based* The paper is divided into four 
sections. In the first section, key concepts are defined, in the 
second, concepts are used to describe the place of the NIE evaluation 
plans within_the overall policy space in which evaluations typically 
take place. Explanations are offered as to why the plans are as they 
are now. The third section explores hbw_ the pribrities in the NIE 
plans lead to a particular sort of panel study. Through cbhsideratibh 
of other sorts of panel studies^ suggestibtis are bffered cbhcerhihg 
how different questions might have been stressed. Inthe fburth and 
final section, conclusions are drawn about how the NIE priorities 
were established; questions are also r concerning the 

possibility of expanding research activities within the scope of 
Strand i of the Follow Through evaluation. (RH) 



ED 244 736 

AUTHOR 
TITLE 



SPONS AGENCY 
PUB_DATE 

NOTE 

PUB TYPE 



EPRS_ PRICE _ 
DESCRIPTORS 



IDENTIFIERS 



ABSTRACT 



i 



* Reprbductibhs supplied by EDRS are the best that can be made * 

* from the briginal dbctxmeht. * 

**************************** ** *****************************ieie*ieieieieie**** 



EKLC 



U.S. OEPARtiViENt OF EOUCATibN 

NAtld NAL iNStltUtE OF EOtiCATiON 
EDUCAf tONAL RESOURCES iNFORMAliON 
^ CENtER iERiCi 

• y\ This documeni has been reproduced us 

tfiCf^y^ed ffofii ihe person or orgam/ation 
Dnijin^tincj it 

- - Minor chiiritjjfs havi} been niade to improve 

'^(^ reproduction qij[iiiiv 



• Pbifits of view or oprnions slated in this docu 
— nieni do rot necessariiv represent official nIe 

"pbiiitibri or poiicv 

CD 

j I i Which Questions do we heed to ask about Follow Through?: 

Some Thoughts stimulated by considering Llie Functions of Panel Studies*^ 

Thomas D. Cook 
Northwestern University 



^This paper was pi-epared at the request of L:l;e National Institute oi Education 



ERIC 



1 



IhtrdducL i bri 

i was asked to wriLe about recent duveLjpments in the design and 
nnalysis of panel sttidTj?s, and to relate these developments to the provisional 
plans that the National Tnstitate of Edticati on (NTE) has proposed for 
evaiaating Follow Through, 

Tlie paper uses its manifest purposes to explore a precondition for 
how panel studies are used- The form of a panel study depends heavily on 
the specific form of the research questions to be answered- I distinguish 
three types of panel study, stressing that they can be merged in practice. 
Repeated measures taken before beginning a program, project>, or procedure 
have most utility for estimating (and exL rapolating) m i . i L'i i i a ticimi trends 
and group differences in such. Data on this issue helps considerably in 
making causal inferences. Panel data collected during a program, project, 
or procedure can describe the nature and context of implementation, can 
describe levels on outcome variables, and can be used to relate implementation 
to outcomes. Panel data collected after an experience is over can help 
estimate the persistence of any changes and help illustrate how any changes 
are integrated into other aspects of life. 

The major thesis of this paper is tliat NJE research priorities indicate 
the use of panel measures only during the time children are in Follow 
Through, and then seem to indicate that on.ly implementation variables 
should be measured. I raise questions about the desirability of the narrow 
focus implied by the NIE priorities^ a])Out the opportunity costs that may 
be associated with them^ about the nature or the events that seem to have 
led to this focus being adopted^ and about Lhe validity of some of the 



3 



2 



assumptions on wliich the focus seems to be based, 

the paper ±s divided into four sections; In the first we define some 
key concepts. fn the second we use these concepus to describe the place of- 
the NIE evciiuation plans within the overall policy space in which evaluations 
typically take place. We also seek to uxplaLu why the plans are how they 
are, tn the third section we show how the priorities in the NTE plans 
lead to a particular sort of panel study and nuggest^by considering 
other sorts of panel study, how different questions might have been stressed • 
In the final section, we draw conclusions about iiow the NIE priorities 
were arrived at, whether more questions cannot be answered without undue 
cost increases, whether it is suf f icienciy reaiiized that one can show 
that services are implemented v 11 that are of little utility, whether the 
time frame of Strand I has been utilized fujly in exploring whether 
questions of a different type might be broaclied, and whether new demonstration 
projects are needed in Strand I — at least initially • 



EKLC 



4 



3 



Some Tetiniridldgy 

I^iat is a Panel S-tudj?^ ? Panel istudies iuvtj.lve measuring at least two 
constructs or at least two occasions on Llio .same units. The units are 
nsaa3iy individuals, buu naed not be. THbufili two waves is a minimum, 
theorists and practitioners of panel studie.-j are unanimous in stressing the 
desirability of more waves. I concur witii clieir opinion. However, the 
resources for multiple waves have to come frbni resources that could have 
been used for other evaluation purposes, and so the number of waves influences 
the total set of research questions that wilJ and will not be asked. The 
measurement of two constructs is also a minimum, arid we will see later that 
modern "modeling" techniques require more constructs thari this. 

Panel studies are also characterized by the measurement of persons 
who have not previously been stratified into differerit comparison groups. 
Thus, in studies of television viewing and violence, a pariel study would 
require a large sample of children whose levels of viewirig arid violence 
are measured at different times. From this measuremerit x>7ould emerge 
information about children who differ in viewing. These differences could 
then be considered as treatment contrasts, oven though they Uete riot iriitially 
designed for this purpose. (The researcher inlght, though, have plariried the 
sample selection to ensure wide variability in presumed television viewing, 
for wi,th«:)ut such variability many research questions cannot be answered,) 
The absence of explicitly designated comparison groups (especially no-treatment 
control groups) makes it possible to do away v.'ith a feature of experimental 
design Lliau costs money, 'nizauses political headaches, and sometimes fails to 
seirve its intended baseline function. 

The Units about which Evaluation Questio ns ai*:^ asked , Some evaluation 



5 



4 



questions focus on the program as the unit of study. In the Follow Through 
case, this would invoive phrasing questions abdut Follow Through (as opposed 
to not being in Follow Through or being in some program with siiriilar- 
appearing goals that is aimed at ail or part uf the same population). 
Programs rarely die; it is also likely that ii.ujor decisions about programs 
depend on political processes, and not on re^aarch results. Certainly, it 
seems from several sources that Congress* incent is not to pass judgments 
on Follow Through, which is already widely seen as an established service 
program. For these reasons, inferences at the program level are 
presumably of little interest to many persons. 

Other evaluation questions focus on what I call "project types," a 
conception thai: is close to what were called curriculum based "models" 
in the original Follow Through. With models, the principle aim is to 
compare different curricula and modes of delivering curricula that have some 
similar-appearing aims, though they will also differ from each other in 
their unique aims and in the unique emphases they place on aa^ shared aims. 
The difficulties involved in comparing modets are legion, including (a) 
how to deal with differences in aims; (b) the political conflicts associated 
with a hot-house horse race; and (c) the heterogeneity of projects funded 
from the same model. Also, some theorists of education are convinced that 
prior evaluations have adequately informed us of the models that are more 
effective in meeting particular goals. For such persons, "success" depends 
on getting curricula implemented rather than on getting better curricula 
designed . 

Questions can also be asked about projects. In the Follow Through 



6 



5 



context these are the specific sites that huve adopted a model sponsdr. 
Typical questions abont projects revolve around detecting particularly 
"successful'* projects (hopefaiiy according, to a set of heterogeneous 
criteria); or in relating projects that scrubs particular activities to 
changes in particular criteria. 

Procedures are also often the focus of evaluations. By procedures I 
understand activities that occur within projects to achieve subgoals that 
are presumed to be instrumental in furthering larger goals. In the 
Follow through context, procedures of interest might include: Hox^ to 
Increase parental involvement; how to increase the time spent on learning 
tasks, etc. In the NIE materials i was sent these procedures are often 
referred to as "services.*' 

Evaluations can be designed around other units (e.g., policies, 
products, personnel, etc.)j ^^id an evaluation may involve trying to answer 
questions about several different units. It should also not be forgotten 
that evciluations can be designed to discover what are the units most worth 
asking questions about. Indeed, determining units is one of the more 
important functions of evaluability assessment or what NIE and ASPE have 
called "exploratory evaluation in the FoiSow 'fhrough context. It is my 
understanding that an evaluability assessment of Follow Through has been 
conducted, and I have consulted a November 1979 summary of the assessment 
entitled, "Update of the Follow Througli Task Force Activities." This 
update specifies that a decision has bc:en rezK:}[ed to concentrate Follow 
Through on services, with a 20% effort devoted to research for the purpose 
of identifying ways of better implementing procedures that are common to 



EKLC 



7 



all or ihahy Follow Through projects. 

The Types of Evaluation Questions that arc askucl abbuh cihy Unit . Irrespective 
of whether programs ^ models^ projects, proccduves (or policies or products) 
are at issub, most of the specific evaJucitibn questions that are asked, arid 
iribst bf the searching fbr impbrtaht research questibris that takes place ^ can 
be codified as belonging to one of six type.s of research qucs tibri . 

First come questibhs abbut the clients, the audience. In Fbllbw Through^ 
questions bf this type refer to the number of children served, their 
dembgraphic prbfile relative tb what is khbv/n bf the desired target audience^ 
the numbers bf parents whb are invblved in different ways ^ the prbfile bf 
irivblved parents, etc. 

Secbhd come questibhs abbut the hatiirCj quantity, and presumed quality 
bf services that are delivered. Issues here cbhcerh inferring the 
educatibnal services that were and were ribt delivered and alsb describing 
the educatibnal context in which the delivery took place, for the cbritext 
will bfteh facilitatis br impede changes in wliatever criteria are deemed 
iiripbrtant. Thebry, professional experience, arid pilbt-testirig bri site, are 
usually used tb select the particular iinpleliicntatibn variables examiried . 
Indeed, ciii three bf these sburces are evidenced iri NIE's list bf pbssible 
themes fbr the Strarid I evaluatibri tb eniphasi^e. 

I 

A Lliird type bf ques tibri is abbut cffectfj. Fbr mariy cbimnoritatbrs bri 
evalucitibn, this is brie bf the more crucial and ribvel aspects bf evaluatibri. 
Iri this dorriairi we warit tb kribw: Hbw effectivtj is a procedure, project, 
models j^rbgram, etc. iri bririgirig abbut X or Y fbr the clierits bf the prbgram. 
At issue, here, are bbth iriterided arid uiiiriLericictl effects, short-term arid 



7 



k fourth type of question relateis to how a program influences higher- 
order aggregates that may or may not include members of the client group • 
T call questions of this type questions about impact, arid iri the Fdllbw 
Through case impact questions might refer to tiffects on families, neighbor- 
hoods, otiier school curricula, other schoa]. programs, etc. Impacts are 
harder to bring about and are more distal tliaii effects oil primary target 
audiences (who, in the Follow Through case, are children). 

A fifth type of question that I think useful is about financial 
costs — total costs, cost per unit per time interval, cost-effectiveness, 
and — for the adventurous with an underdeveloped sense of the tenuous nature 
of its assumptions — cost-benefit analysis: 

Finally, evaluations often aspire co asking questions about causal 
process. This is to gain an understanding of processes that mediate 
particular patterns of implementation, effectiveness, or impact. Such a 
concern presupposes the utility of differentiating between simple causal 
relationships (of the form: l^ien 1 flick the switch the light goes on) 
and more complex explanatory processes (the light goes on because the 
current passes along the vjxre, it strikes a filament in a light, etc.). 
Knowledge of. causal processes helps design more efficient procedures for 
delivering services than does the more "black box" knowledge associated 
with ideacifying dependable causal re3 ationsid ps whose mediating mechanisms 
are not known well. 

Many reasons exist for suggesting tlte uLiiity of these distinctions 
1 have made about six types of evaluation, Oiiu of them will hopefully 
emerge in the next section. 



EKLC 



9 



8 



sume 



The Policy Space for the proposed Strand T of the FSiiow Through Evaluation 
Locatin g the Strand JI Evaluation Plans , The NIE docainents I have seen presi 
that the unit for ev luatidn is the procedure (or service) and not the program, 
model, project or anything else. They a 1 »o presume that the major type qf 
question to be asked is about success in iinplomentation , I draw these 
inferences from tiie 'illustrative themes for pilot projects" contained in 
the document; *'Pians for Follow Through Research and Development*' by NIE 
staff members. The themes require identifying: 

^ Means to increase instructional time in Follow Through classrooms 
through improved management of services; 

- New patterns of in-service training arid selection of teachers to gain 
better instructional management, including cooperative agreements 
between schools, teacher training institutions and teacher associations 
or unions; 

- New ways to systematically involve parent and community groaps in 
planning and conduct of Follow Through programs, including the use 
of parents and families to provide instruction in the home; 

- Mew uses of information systems, including testing and evaioation 
results, to bring better diagnostic arid prescriptive infomation 
to bear on Follow Through student learning needs; 

- Mow ways to facilitate support scliobl building and district 
administrators for the substantial changes typically required by 
innovative Follow Through procedures. 

The policy space in which it is planned tiuat evaluation of Follow 
Through siiould take place is illustrated in rJie matrix below, where units 
of evaluation are crossed with the major tyjuis of question posed in 



i0 



9 



- -- 

evaluation. The one cell entry is where N1 E proposes to be. It seerns^from 
the Plans for Follow Through Research and Development and from the summary 
of the "exploratory evaluation" (or evaluabiJiiy assessment) that the 
rest of the matrix will remain unexplored territory. 



Table 1. The Place of the proposed Strand I Evaluation Activities in 
Evaluation, Policy Space 



WILJ- L. LJ J. i-' V Cl .L Ud L. X U Ll 

Question asked in 
Evaluation 


Program 


Model 


Project 


Procedure 


How many children of 
different backgrounds 
are reached by ;.he 
program/model/project 
procedure? 










How \rell is each of 
the units implemented? 








Strand I of the 
proposed NtE 
evaluation 


How effective is the 
unit oh direct 
recipients of its 
services? 










How impactful is the 
unit? 










Wliat costs are 
associated xvith 
it? 










Wh i c h p r b c e s s e s 
mediate the 
observed p;it terns 
of resulcs? 











10 



Justifying the SErand I Evnluuiion Plans : tlie Strand I evaluatibri plans 
seem to be the result of an '^exploratory evaluation" (evaluabili ty 
assessment) conducted under the auspices of ;loe Whdley when he was in 
ASPIL^ together with some employees of USOl^. I have had access to a 
summary of the assessment and of the thcin Secretary of Education's reaction 
to it. the discussion below is based on my rc^ading of the summary as x^ell 
as on background knowledge of the history and politics of Follow Through, 
My concern is to try to identify the bases on which a decision was made to 
cast the evaluation as a research project aimed at inferring ways of 
implementing important procedures or "services," 

( a) The Choice to focus on Procedures or "Service s. " tet us consider why 
procedures may have been chosen as the unit of evaluation. It may seem 
particularly fruitless to study Follow Tlirough at the program level when it 
is already considered to be an established service, has a powerful political 
constituency to support it, and any\^7ay evaluations rarely influence global 
decisions about programs. 

Alternatively, one could study models. But why do this when there is 
as much variability (in learning "gains" at least) within models as between 
them, and when the horse race between models causes political headaches 

whose pubJ.ic manifestations serve to undermi[ie the credibility of all Follow 

^ .. 

Through fjvaluation efforts. Moreover, one might think that one already 

knows which curriculum-based models are "effective," and major remaining 

issues are how to get local authorities to sponsor such models and how to 

get them implemented well when they are adopted. 

One could alternatively examine projcrt.s. But why evaluate these 

unless (a) there is a rapid turnover of I'oiiov: through schools at the local 




ii 



level, or (b) realistic plans exist to expand the number of Follow Through 
sites and (c) NIE is in a position to affect the cux*riculuin that is 
implemented in new projects? It would saoin thut both either (a) or (b) 
and (c) would have to be true to make worthwhile a focus on identifying 
successful projects and then examining why tlicy may be successful. 

Procedures, on the other hand, are less tlireatening as units of evaluation. 
They do not question the core rationale of a spdnsdr; and they are usually 
not seen as relevant to major decisions about a project's fate. Also, some 
procedures are common to a large number of projects or — like parental 
involvement — are mandated for each project. Research on transf errable 
successful pi'ocedures may have an audience, therefore, particularly since ; 
rc is easier for projects to change specific procedures than to change 
educational philosophy. Finally, with the level of background experience 
we have in evaluating Follow Through, some would argue that we need greater 
anthropological wisdom about what goes on in Follow Through classes befnre 
we ask questions about such grandiose units as programs or models or even 
schools as projects. One of the greatest sins of the original Follow 
Through, as we now see with cheap hindsight, v/as premature grandiosity and 
an inadequate modesty in the face of reality in general, and of Murphy in 
particular. 

The al)ove argument^ which I stress is partly hypothetical, is super- 
ficially persuasive. And we shall soon see, liowever, it overlooks three 
factors: 

- rirst, one can modify procedures in projects whose basic 



conception is flawed and from which children benefit relatively 



Jlttle in obvious and important v?nys; 




ERIC 



12 



- second^ the time frame assumed for evaluation in Follow Through 
might well allow a more dif f erehliatecl evaluation strategy that 
moves from the evaluation of proceJurus to the evaluation of other 
units; 

- third, the validity of some of the assumptions in the above argumerit 
are murky--at least they are murky to me. 

(b) The Choice to focus on Questions of tiupl ementation . We turn how to the 
decision to focus on questions of implemencatlon,^opposed to focussing on 
other types of question or as opposed to presuming that the questions worth 
asking still needed to be discovered. I infei^ that the implementation of 
procedures or services^-i . e . ^ how well is something done--is at issue, 
rather than^ say, the effectiveness of services—i .e . , what effects does a 
service have if it is conducted well and parents are better involved, 
students spend more time at learning tasks ^ etc. --from the fbllbwihg 
qudtatibh from the OE paper entitled^ "Update of the Follow Through Task 
Force Activities": 

The Fbllbw Through prbgram of the future will 
have twb clear purposes — firsts it will provide 
effective cbmpreherisive services tb pobr children 
ih.Pl^i^Gntary schbbls in the nation; second ^ it 
will fund activities designed to improve bur 
understanding of the ways that cbmpreherisiye 
educatibnal services may be most effectively 
delivered to financially needy elementary 
schbbl children. 

Nbte tliat the reference here is tb "the ways Lliat services can be effectively 
delivered" and not tb ways in which denionsLraU ly effective services can be 
effectively delivered. 

It is hbt difficult to see how the decision tb focus on implementation 
may have arisen. After all^ ef f ectivenetis is usually mbre difficult tb pin 
down with confidence; and impact is even harder. Moreover, an explicit 



14 



13 



interest in effectiveness and impact can lead to political battles; as can 
cost issues. Besides ^ it is presumed by soma that the current literature 
indicates that time oh task increases performance, and that parental 
involvement solidifies political bases (cimung other things). Why, then, 
should one examine the effectiveness of proccidures that can be assumed to 
be effective? Finally, one could argue that exploring implementation will 
help, not only in aiding sponsors and adopting schools to implement better, 
but will also provide powerful clues to help uncover causal mediating 
processes . 

a 

But the same problems exist forj^focus on implementation as for a focus 
on procedures or services. We may well be examining how well procedures 
are implemented that are not successful, or only successful by certain 
criteria^ or are only successful under a x^estricted set of conditions. 
Also^ one has to ask whether a focus on the implementation of procedures 
needs to be exclusive of a focus oh the ef Fei: c iveness of procedures. 

Relating Pahel Studies to Priorities in Evaluation Questions 

Panel studies involve at least two v/av«jri of measuremeht; . However^ 
the purposes to which longitudinal measurement is put depehds in large part 
on when the measurement is made. We distinguish three times--bef ore a 
child enters Follow Through^ during Follow Through^ and after exiting from 
it at tlt^ end of the third grade. We shall s»jtj that the different purposes 
associated with the different times of ([leasuieiiient speak directly to the 
issue v;e liave just raised of justifying the duniLnant research questions. 
Measures collected before Follow Throug h. Two or more waves of measurihg . 
achievement i self cohcdp ti ahd background ch:.i]\icteristics can serve a 
very useful purpose if they are collected hei'nre a child begins ih Follow 

15 

o 

ERIC 



Through; Perhaps the major inferential probiiem that occars with effectiveness 
oriented studies of children is the lack of information about matarationai 
trends and about group differences in such trends. Such trends cannot be 
sensitively described with only one wave of pre-Foilow Through data (i.e., 
the normal "pretest"). Much more sensitive estimates are possible with two 
or more pretest waves. Indeed, it is just cliis feature which provides the 
rationale for the so-called "dry run experiment." 

€ollecting data prior to Follow Through is less of a priority the more 
one focusses on implementation as opposed to effectiveness or impact and 
on procedures as opposed to projects or niodeJ.s. However, in the Follow 
Through case, the transfer from Head Start ejigibility to Follow Through 
eligibility may mean that for many children some measures are in their 
"file." if they are available for enough students, are directly relevant, 
and are of reasonable quality, then most of the advantages of more than one 
pre-project measurement wave can be gained. The advantages are not iron- 
clad guarantees that the observed maturational trends will continue into 
the future for a specific type of child. Such extrapolation is the crucial 
untested assumption that may, however, be partially probed using other 
sources of data. But while two measurement waves prior to Follow Through 
is no panacea, it is a vast improvement over current practice with a single 
time of pretest measurement. 

Measures c ollected during Follow Through . Multiple waves of measurement 
during iVjIlow Through offer the potential for (u) comprehensive description 
of the services delivered, both in terms of cjuantity and quality; (b) 
continutjus assessment of performance measures-— at the level of school^ 
class, teacher, parent, and child and ic) a chance to relate implemehtatioh 



16 



15 



and effectiveness measures cross-seccibnaliy and "with time lags. A data set 
with these three characteristics would go some way towards meeting the 
temporal precedence and covariation cricerin of causal inference; and with 
enough grounded throry and experience and with high quality measurement 
the data set might plausibly lule out many altiernative interpretations to 
preferred causal inferences, 

A description of implementation processes is enhanced by multiple 
measurement waves ^ since implementation is usually a dynamic process. To 
measure at one time would give little sense of the learning and feedback 
that goes into improving implementation at the local level. Nor would it 
assess as many of the intermittent outside forces that impinge on implementation 
to improve or impede it. Moreover, if children are to be measured in terms 
of the amount and quality of services they receive, this measurement is often 
better the more it is based on the stable level of services a child 
receives. Measures that depend on measurement on a single day or week may 
be unstable beciUse of time-bound factors that happen to increase or 
decrease the value of observations at the time of measurement. Finally, 
it shduJd be noted that multiple waves of measurement give the researcher 
a chance to be a student, to learn which features of the children 's,- 
teacher's and parent's experiences deserve to be measured. Later waves can 
therefore include new constructs that refJect such learning by the evaluators. 
The (zurrerit NIE question empihases seem Lo me to be most relevant to 
questions that cbUld be answered with niulci-v>7ave measurement of implementation 
du ring ('dIIow Through. 

Hov;es/er, it is also possible during Follow Through to measure the 
performance of children as well as the activities, etc. in which they 



EKLC 



17 



participate. In tny reading, the trieasureiiierit of outcoihes plays little role 
in the current plans fbir Follow Through. (However^ I acknowledge that I 
have not seen all the relevant ddcuiiierics arid Lliat the measureinent of 
performance may be taken for granted because of school testing practices.) 
Major difficulties with the repeated measuremont of performance include 
political factors — x^hat should be routiiieJy monitored as child-level 
outcomes, which "interest group" will bbjecL to particular measures, and 
also technical factors — what can we measure well that is likely to change 
in the time span of Strand I, arid will response formats encourage memory 
of prior responses? 

It is in relatirig procedures to perfdrinarice that panel studies seem tc 
many to be most likely to be useful. Alas, though^ the state-of-the-art 
fuL uuli'jWMiirg is imperfect. We kridw eridUgh not to trust did dogs like 
cross-lagged panel correlation because ridthirig cari be bought with so few 
assumptions, as Rogosa and Cook and Campbell have pdirited dut. We think 
at present that we need to work a framewdrk df structural equation 

models. To use such "methods ' best** we also believe that drie shduld 
postulate many theory-based models of the causal relationship between 
procedures and outcomes, and shduld put these iritd competitidri with each 
other as opposed to testing the goodness of fit df drily a sirigle iriddel. 
Moreover, most of us believe that the uoiisLriitiLs iri these systeiiiically 
related models should be measured with ut leajiL twd fallible dperatidris 
so that inference is at the level of laLenL constructs (factors). 

Bat some problems remain, and they are Lungh did dnes. Orie irvdlveis 
how to avoid specification error. Many answer td this that, iri the abserice 
of multiply validated and grounded theories, oiie should take care Ld put 



18 



17 



models into compctitioh with each other, realizing that the exercise is 
explicitly theoretical in a substantive sen;sci and that, if the theory 
were so good^ one would not need to be doing th^ modeling! A second 
problem is how to deal with reciprocal caxisni Influences. Much work is 
taking place on this t U ^ n ry issue, bat t urn dubious that break-throughs 
are near. Some reasonable shots can be made, as our friends in macro- 
ecbhomic theory and methods are continuaily doing: 

Mul^fcij^le Waves c.^fter Follow Through . Follow Through projects are trivial 
if any initial gains they cause fail to persist after Grade 3. Also, the 
program, or any project or procedure within it, is less important if any 
gains are hot capitalized upon in a child *s iater career so that they help 
him or her in other aspects of school or life outside of school. Apologists 
for Follow Through might argue that the program is not responsible for 
maintairiirig gains and for translating them into better performance in other 
school or riori-schobl areas. Such factors depend on other factors, most of 
which are stacked against the kinds of children who are eligible for Follow 
Through. The apologists are absolutely right in one sense, but may be 
misleadin^i in another. Programs do not exist in a social void, and if the 
institutions and programs that a child experiences after Follow Through 
do riot capitalize upbri the program^ then its overall utility has to be 
called iiiLo (question. But such issues aside, tlie function of follow-up 
pariel misasiires takeri after exiting from Follow Through would be to describe 
the persistence of chariges arid Lb examine how such changes might facilitate 
other chariges iri the schbbl br nbn-school life of a child, parent or 
teaizher . 

Measures takeri after exitirig frbm Follow Through at the end of the 



19 



18 



third grade— or measures taken after "graduatihg" from some experience within 
Follow Through — are more obviously related Lo effectiveness than implemen- 
tation questions. Moreover, where delayed measures were made in the past, 
. they seem to have been associated more with evaluating programs ^ models, 
and projects than with evaluating procedure's. This is riot an inevitable 
relationship, though 1 suspect it is a probabilistic brie. 
Mixing Times of Measurement . An evaluation can be desigried to measure 
constructs at several times, some of them coming before ari experierice to be 
evaluated, others coming during it, and others coming after it. In other 
words, the design of evaluations permits mixing times of measurement so as 
to tap into the different strengths of measuring before, duririg and after. 

The current NIE evaluation plan seems to lead researchers to the use of 
a panel study with multiple waves of measures of how services are implemented^ 
and indicate measurement during the program. 1 see rio -a priori rieed to 
restrict oneself to implementation measures collected duririg Follow Through 
(because performance measures can also be collected then), or to restricting 
onself to measures that are only taken during Follow Through. 

Gonciusions 

1 have tried to describe the policy space in which it is prbvisibrially 
planned that Strand I research and development activities should take place. 
According lo plans as I have read them, this space is defined by askirig 
questions about procedures (services) rather than programs, models, projects 
or anything else, and by a concentration c:i tlie quality with which procedures 
are impluinented rather than on how effective, impactful, cost-effective, 
etc. thoy are. Such a priority suggests che need for multiple waves of 
measurement of implementation during Follow Through. I briefly sketched 



2D 



19 



the advantages of this, but did not dweii on ways of actually cbhductihg the 
research. 

My major concern is not with how Co u.^o panel methods to implement NlE's 
priorities. Rather my concern is with Lhasu priorities themselves. 

- First, how were they originated? IL seems from memoranda I have read 
that a small OE and ASPE team puzzled through what the qeustions \>forth asking 
were in consultation with other OE and Folluw Through managers. Later, the 
then Assistant Secretary of Education narrovved the questions further. Where 
IS the input f romj^mul tiple stakeholders that we have come to expect in 
evaluation? And if such input is now to emerge, how impactful can we 
expect it to be since tentative pians have aJ.ready been formulated by 
jDbwerful Federal groups with relatively honiogeneous interests? 

- Second 1 suspect that I could design feasible evaluations to answer 
a wider range of evaluation questions than 1 see addressed in the NIE 
guidelines. Most of the policy space in niy Table 1 is, for example, 
ignored. Moreover, 1 am immodest enough to believe that I could do this at 
lit tie i if any, additional cost in terms i)f either money or the quality of 
answer to questions about service impieinentnt Ion . Is there to be any chance 
to expand on the restrictive set of questions in the current evaluation 
plans? Can the net be ca.*=t wider to catch more fish? 

- Tiiird, 1 v/onder if there is sufficient cognizance that une can 
implement well services that are generalJy ineffective or that are effective 
only under certain conditions. Do we really believe, for example, that if 
children spend more time on an ineffectual task, they will learn? Time on 
task^ like parental involvement, the use ol: i.iedia materials and other services 
mentioned by NIE, presume effective currlculuai materials, among other things. 



21 



20 



Are we so cohfideht with Follow Through that we know about such effective 
materia] Si or that local schools will clioube spon^sors that have them? To 
overstate the case, high quality impleiiuintai: .i on may be necessary for 
important outcomes buL it is not sufficlenL, Given this, can we study 
implementation without" relating it to etiacti veness in the same study or 
in ah immediate follow-up study? 

- Fourth^ Strand I is long enough that a quasi-programmatic evaluation 
plan might be developed that begins witli a sLtidy about both the quality 
arid effectiveness of implementation; and then, in the next study, asks 

how such procedures can be transferred and whether, once transferred, they 
affect tile crucial overall ou tcome measures of relevance to Follow Through. 
(By the latter I do no_t mean just achieveiiien L ! 

- Fifths the NIE documents suggest selling up demonstration projects to 
examirie riew patterns of implementation. Hut what is the rationale for such 
projects, or for embedding new practices wiiliin. old projects when considerable 
variability probably already occurs across projects in each of the procedures 
of iriterest (time on task^ etc.). Utilizing existing variability might 

be quicker arid cheaper than setting up dcinunst rations , and would allow one 
to move uri to other questions. 

My cbricern is riot to push ariy particuJar conception of how Follow Through 
should be c:valuated. It is to raise quesLJoas about the questions worth 
askirig about Follow Through. My motive tor doing this was that the question 
of quest lull priorities emerges as sdori .ks one considers panel sLudies arid 
the different purposes that are usually hiei by ineasures collected before^ 
duririg, ur after an cxperierice that is Lb be evraluated. 



22 



